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(57) ABSTRACT 



A two-level Internet search service system. Within the 
platform of the two-level Internet search service system, 
Web site builders or Webmasters will have better control on 
correctly presenting indexes of their contents; Web users 
will have better control to clarify what they are looking for, 
can give feedback on service improvement, and can give 
their opinion for which Web site is more relevant to their 
query. In the two-level Internet search service system the 
searches are distinguished into search service provider (SSP) 
level and in -site level. The objective of SSP level search is 
to bring Web users to the right Web destination. The objec- 
tive of in-site level is to find exact information Web users are 
seeking. 
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TWO-LEVEL INTERNET SEARCH SERVICE 
SYSTEM 



CROSS-REFERENCE TO RELATED 
APPLICATION 

[0001] This application claims the benefit of U.S. Provi- 
sional Patent Application Serial No. 60/203,943, filed May. 
12, 2000. 

BACKGROUND OF THE INVENTION 
[0002] 1. Field of the Invention 

[0003] The present invention relates to resource discovery 
on the Internet, especially to the functions currently per- 
formed by search engines and directories. 

[0004] 2. Description of Related Art 

[0005] Search engines play a vital role in finding things in 
cyberspace. The first of these search engines, Archie, main- 
tained a database of approximately 1,500 host computers, 
which housed files accessible through the file transfer pro- 
tocol (FTP) space of the Internet. FTP sites were feasible in 
the early days of the Internet when the number of users and 
host computers was relatively small. The next advancement 
in search engine technology produced a more user friendly 
system called Gopher, a subject-based, menu-driven guide 
for finding information on the Internet. Gopher searched all 
the files located on a particular host; however, it was 
impossible to know if all information relevant to a search 
resided on one host. Visiting all 6,000 Gopher servers would 
be an incredibly time-consuming task. Therefore, the search 
tool Veronica was developed to search Gopher space. 

[0006] With the advent of the World Wide Web, it became 
necessary to create a new tool for cataloging information 
found on this portion of the Internet. Unlike a real-world 
landscape, the contours of the Web constantly shift as sites 
come and go, necessitating constant * reconnaissance* in 
order to provide users with an accurate map of its offerings. 
This "map", which is really a catalog, is at the core of the 
search engine. When a user inputs a query into a dialog box, 
the user is not searching the Web per se. Rather, the user is 
searching an index of the Web created and continually 
updated by the search engine. 

[0007] The first Web search engine was World Wide Web 
Wanderer. It used an automatic search agent called * robots' 
to track the Web's growth. Then there was a search engine 
called ALIWEB. ALfWEB didn't use a search agent. 
Instead, it asked people to write descriptions of their Web 
service and register at ALIWEB. ALIWEB then periodically 
retrieved information from registered Web servers and com- 
bined them into a search database. Soon robots-based search 
engines and editor-based directories became the popular 
tools for Internet search. The search agent works like a chain 
reaction. It starts from one Web page and follows the 
out-links to other Web pages and then repeats the same 
pattern on each Web page it finds. 

[0008] Early search engines focused on 'breadth first' and 
'depth first' search agents. Referred to colloquially as 
'robots' or spiders', these agents were set loose onto the 
Web to index HyperText Mark-up Language (HTML) files 
residing on the myriad of servers connected to the Internet. 



[0009] The 'breadth' first approach works by 'hydroplan- 
ing' over the expanse of the Web, taking note of any 
hyperlink references found in a file, but deferring any deeper 
inquiry in favor of moving on to cover as much territory as 
possible. In contrast, a 'depth first' approach works by 
honing in on a site, dropping anchor and thoroughly explor- 
ing every pointer leading from the file, drilling down until 
the search agent finds a file with no finks outward bound. 

[0010] Search engines operate in a three -step procedure. 
First, specialized types of software, referred to as 'robots', 
'spiders' or 'crawlers', go out and retrieve information about 
Web sites. The search engine either has found these Web 
sites itself or the Web sites (through their Webmasters, those 
in charge of Web site management) have asked to be 
indexed. Some search engines tend to "index" (record by 
word) all of the terms on a given web site. Some may index 
only the terms within the first few sentences, the web site 
title, or the site's metatag, which is not viewable on the 
actual page and contains a short summary description pro- 
vided by the site designer. Search engines must re -sample 
the web sites periodically to detect any change since last 
indexing. 

[0011] The second step is to store the indexes in a database 
with ranking for each web page. The rank reflects the 
relevance of the web pages to certain keywords. A propri- 
etary algorithm is used to evaluate the index. Since different 
search engines employ different algorithms, the ranking 
results are not consistent from all the search engines. When 
an Internet user types a keyword or keywords as search 
query, search engines retrieve indexed information which 
matches the query from the database. This last step com- 
pletes the service of search engines. 

[0012] Internet directories operate on a different principle. 
They require human editors to view an individual web site 
and determine its placement into a subject classification 
scheme or taxonomy. Once done, certain keywords associ- 
ated with those sites can be used for searching the directo- 
ry's database to find web sites of interest, or people can 
follow the structure of the directory to find the information 
located under the directory structure. 

[0013] Coverage on the search engines and directories 
affects the Internet usage significantly. People are heavily 
relying on the search engines and directories to use the 
Internet. According to a recent study, two-thirds to three- 
quarters of all users cite finding information as one of their 
primary uses of the Internet and more than 98% of active 
Web users rely on the Internet to find reference material, 
30% on a daily basis and a further 40% on a weekly basis. 
The major Internet search engines-HotBot, Northern Light 
and AltaVista — individually catalog at most 16% of the 
Internet's sites. As the amount of web pages increase, the 
coverage in the past two years has showed a decline. 
Combined, the results from all search engines the total 
Internet coverage is only about 42%. Due to the cost and 
time in individually assigning sites to categories and the 
editorial policy used by directory companies, lack of cov- 
erage is also a problem for Internet directories. 

[0014] Although some search engines companies (Google 
and Inktomi) claimed their coverages are over 1 billion Web 
pages now, there is more content than current search engine 
companies can cover. There are more than one million new 
pages everyday. There are more nonHTML-text contents, 



02/04/2004, EAST Version: 1.4.1 



US 2001/0039563 Al 



2 



Nov. 8, 2001 



e.g. Adobe's portable document format (PDF) and formatted 
files and multimedia files created. Also there are many 
non-crawlable contents, such as sites that have no links 
pointing to them, sites screened by a login, corporate intra- 
nets, sites that use robots.txt scripts to bar search robots, and 
deep content. Studies show that the "invisible Web" contains 
deep contents, nearly 550 billion Web pages, and most of 
which are open to the public but never touched by search 
engines. It is estimated that more than 100,000 deep Web 
sites exist. Another reason that search engines have difficulty 
finding all information on the Web is the structure of the 
Web, a bow tie shape according to a recent study. There is 
a large cluster of the Web that contains Web pages that 
cannot be reached by links. 

[0015] Some researchers see the coverage problem as 
damage to the intention of the Internet as a public good. The 
Internet as a public community embodies the ideals of a 
liberal democratic society. It is a rich array of commercial, 
political, academic, and artistic activities that fosters asso- 
ciations and communications of all people around the world, 
and provides a virtually endless supply of information. As 
technology progresses, it is certain that there will be more 
Internet applications. If trends on Internet directories and 
search engines lead to a narrowing of options, the Internet as 
the kind of public good that many people envisioned will be 
seriously undermined. 

[0016] The information retrieved from search engines 
doesn't satisfy relevance very well. The indexing methods 
used by current search engines often misrepresent the con- 
tents in the indexed Web sites. Web site builders don't have 
much control on what they want Web users to know about 
their Web site. To increase a Web site's chance to be indexed 
correctly and to be placed on a higher spot in the "found 
sites" list, Web designers need to spend extra efforts to make 
a Web site suited for the search engines. This is always a 
confusing job because each search engine uses a different 
algorithm for indexing, and many keep it a secret. 

[0017] Since the relevance is poor, as Web users conduct 
searches by using search engines, they suffer so-called 
"information overload", i.e. too much irrelevant information 
and no efficiency. In the worst cases, submitting broad query 
terms to search engines can result in literally hundreds of 
thousands of potential Web pages identified. Many times 
users also get the same Web site and/or pages repeatedly 
appearing in the found result. To find what they want, users 
usually need to visit several search service Web sites, 
[0018] Recency is poor. 50% of Internet users cite as one 
of their typical search problems as searches that turn up 
broken links. The bigger the search engine service is, the 
higher percentage of the dead links it has. It seems that there 
is a trade off between comprehensiveness and recency. 
Reducing the time between re -sampling is a big challenge 
for search engines. It will also unreasonably increase a 
visited Web site server's load. There is a considerable 
backlog on the directory service; for example, it can take six 
months for Yahoo! to put a site under its directory, if the 
editor decides the Web is suitable to be included. Therefore, 
recency will be a serious problem as the Web increases with 
a fast speed. 

[0019] For current search engines and directories, Web 
users don't have much to say at getting a better search 
service. Because the Internet grows so rapidly, a self- 
improving search service is necessary for Web users. 



[0020] Metadata, structured data about data, as a way to 
improve Internet searching has been proposed by Dublin 
Core Metadata Initiative since 1995. World Wide Web 
Consortium (W3C), under the leadership of Tim Berners- 
Lee, also proposed Resource Description Framework (RDF) 
for broader Internet applications including resource discov- 
ery. However, these metadata standards have to be recog- 
nized by search engines. Currently without support by any 
of the major search engines, there is no reason for Web site 
builders to put them into the Web pages and the Internet 
community cannot benefit from them. 

[0021] The related art includes articles that call for 
improvements in searching on the Internet. Searching the 
Web: General and Scientific Information Access and Acces- 
sibility of Information on the Web by Steve Lawrence et al., 
and Defining the Web: The Politics of Search Engines by 
Lucas Introna et al. discuss limitations of current search 
engines. 

[0022] Inventions of interest, as depicted in patents, 
include U.S. Pat. No. 5,283,731, issued on Feb. 1, 1994 to 
James E. Lalonde et al., which describes a computer based 
classified ad system and method. 

[0023] U.S. Pat. No. 5,319,542, issued on Jun. 7, 1994 to 
John E. King, Jr. et al., describes an electronic catalog 
ordering process and system. 

[0024] U.S. Pat. No. 5,649,186, issued on Jul. 15, 1997 to 
Gregory J, Ferguson, describes a system and computer based 
method providing a dynamic information clipping service. 

[0025] U.S. Pat. No. 5,721,910, issued on Feb. 24, 1998 to 
Sandra S. Unger et al., describes a database system and a 
method of producing a database which can be used to assign 
scientific or technical documents, such as patents and/or 
technical or scientific publications and/or abstracts of these 
patents or publications, to one or more scientific or technical 
categories within a multidimensional hierarchical system 
which reflects the business, scientific or technical interests 
of a business, scientific or technical entity or specialty. 

[0026] U.S. Pat. No. 5,727,156, issued on Mar. 10, 1998 to 
Dirk Herr-Hoyman et al., describes a method and apparatus 
for posting hypertext documents to a hypertext server so as 
to make the hypertext documents accessible to users of the 
hypertext document system while securing against unautho- 
rized modification of the posted hypertext documents. 

[0027] U.S. Pat. No. 5,745,882, issued on Apr. 28, 1998 to 
Matthew J. Bixler et al., describes an interface for an 
electronic classified advertising system that includes the 
capability for the user to enter search criteria for an item of 
interest, to save the search criteria and to be notified by the 
system when an item matching the search criteria is entered 
into the system. 

[0028] U.S. Pat. No. 5,794,236, issued on Aug. 11, 1998 to 
Joseph P. Mehrle, describes a computer based system that 
will classify a legal document into a location within a legal 
hierarchy. 

[0029] U.S. Pat. No. 5,799,284, issued on Aug. 25, 1998 
to Roy E. Bourquin, describes a computer system that 
utilizes client/server software to allow users of the client 
software to log into a server and publish information about 
a product or service. 
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[0030] U.S. Pat. No. 5,855,013, issued on Dec. 29, 1998 to 
Dave C. Fisk, describes a method and apparatus for creating 
and maintaining a computer database using a virtual index 
system. 

[0031] U.S. Pat. No. 5,870,717, issued on Feb. 9, 1999 to 
Charles F. Wiecha, describes a system for ordering items 
over a computer network using an electronic catalog. 

[0032] U.S. Pat. No. 5,963,951, issued on Oct. 5, 1999 to 
Gregg Collins, describes a computerized on-line dating 
service that provides user-controlled perusal of search 
results. 

[0033] U.S. Pat. No. 5,974,409, issued on Oct 26, 1999 to 
Sankrant Sanu et al., describes an enhanced find system and 
method for locating offerings within an interactive on-line 
network. 

[0034] U.S. Pat. No. 6,009,410, issued on Dec. 28, 1999 to 
Suzanne L. LeMole et al., describes a method and system for 
presenting customized advertising to a user on the World 
Wide Web. 

[0035] International Patent document WO 98/19417, pub- 
lished on May. 7, 1998, describes an integrated computer- 
implemented corporate information delivery system. 

[0036] None of the above inventions and patents, taken 
either singly or in combination, is seen to describe the 
instant invention as claimed. 

SUMMARY OF THE INVENTION 

[0037] The present invention is a two-level Internet search 
service system. An Internet search service system should 
have five basic features to meet Web users' needs: compre- 
hensiveness, recency, relevancy, efficiency, and self-im- 
provement. The two -level Internet search service system 
according to the invention is targeted to meet all of them and 
will provide a more superior search result relative to current 
search engines and Internet directories. The two-level Inter- 
net search service system can provide a feasible way to keep 
up with the growing speed of the Internet. The two-level 
Internet search service system can also reduce a Web site 
server's load by eliminating search engines' visits. Within 
the platform of the two-level Internet search service system, 
Web site builders or Webmasters will have better control on 
correctly presenting indexes of their contents; Web users 
will have better control to clarify what they are looking for, 
can give feedback on service improvement, and can give 
their opinion for which Web site is more relevant to their 
query. By using this two-level Internet search service sys- 
tem, eventually it will be possible to provide a one-stop 
search at one Web site instead of going to multiple Web sites 
in a current situation. 

[0038] In the two-level Internet search service system the 
searches are distinguished into two levels: search service 
provider (SSP) level and in-site level. The objective of SSP 
level search is to bring Web users to the right Web destina- 
tion. The objective of in-site level is to find exact informa- 
tion Web users are seeking. At SSP level three tasks need to 
be accomplished. The first is getting the correct search 
indexes about Web sites, which is the base for other tasks; 
the second is organizing the indexes at the search service's 
Web site; and the third is helping people search efficiently. 
The three components of SSP level search service system 
include (1) input system, (2) data organizer at the search 
service's Web site, and (3) interactive search service pro- 
vided by the SSP. 



[0039] At in-site level three tasks need to be accom- 
plished. The first is improving Web units' navigation sys- 
tem; the second is implementing metadata standards; and the 
third is installing an insite search engine to utilize the 
metadata. Three components are needed for in-site level 
search service: (1) a design aid system, (2) an authoring tool, 
and (3) an in-site search engine tool kit. 

[0040] Accordingly, it is a principal object of the invention 
to provide a two-level Internet search service system com- 
prising Web unit submission means, detecting and sorting 
means, multi-directory and database means, interactive 
search service means, and in-site search means. 

[0041] It is another object of the invention to provide a 
software means for automatically generating standard index 
data and multi-directory entries and sending them back to 
SSP's server. 

[0042] It is another object of the invention to provide a 
data Lorganizer at SSP's server to process and store the data 
from submission. 

[0043] It is another object of the invention to provide 
software means for interactive search activities among Web 
users, Web unit builders or Webmasters, and SSP in both 
SSP level search and insite level search. 

[0044] It is an object of the invention to provide improved 
elements and arrangements thereof in a two-level Internet 
search service system for the purposes described which is 
inexpensive, dependable and fully effective in accomplish- 
ing its intended purposes. 

[0045] These and other objects of the present invention 
will become readily apparent upon further review of the 
following specification and drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0046] FIG. 1 is a block diagram of the SSP level search 
service system according to the invention. 

[0047] FIG. 2 illustrates the internal structure of a web 
site. 

[0048] FIG. 3 is the flow chart of the sorting and detecting 
program according the invention. 

[0049] FIG. 4 describes how a web unit can be placed in 
different categories of directories and indexed. 

[0050] FIG. 5 is the keyword or phrase search procedure 
flow chart according to the invention. 

[0051] FIG. 6 is a block diagram of second function of 
search service at SSP level. 

[0052] FIG. 7 is a block diagram of in-site level search 
service according to the invention. 

[0053] Similar reference characters denote corresponding 
features consistently throughout the attached drawings. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENT 

[0054] In the description of the preferred embodiment, 
reference is made to the accompanying drawings which 
form a part hereof, and in which is shown by way of 
illustration the specific embodiment in which the invention 
may be practiced. It is to be understood that other embodi- 
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ments may be utilized as structural changes may be made 
without departing from the scope of the present invention. 

[0055] The present invention is an Internet search service 
system. When an Internet user retrieves Web pages, they use 
a browser to transmit HyperText Transfer Protocol (HTTP) 
commands from their computer to a Web server executed by 
a connected computer. In turn, the Web server responds with 
an HTML (or other formatted) page that is transmitted to the 
browser for display to the user. 

[0056] Typically, users access Web pages by using a SSP 
(Yahoo.com, AltaVista.com, etc.) to find pages regarding a 
topic of interest. If the Web page is of some interest to the 
users, they "bookmark" the HTTP Uniform Resource Loca- 
tor (URL) for that page in their browser in order to easily 
find the Web page in the future. 

[0057] FIG. 1 is a block diagram of an exemplary hard- 
ware environment of the preferred embodiment of the 
present invention, and more particularly, illustrates a typical 
distributed computer system, wherein client computers, or 
users, are connected via a network to server computers, or 
sites. A typical combination of resources may include clients 
that are personal computers or workstations, and servers that 
are personal computers, workstations, minicomputers, and/ 
or mainframes. The network preferably comprises the Inter- 
net, although it could also comprise intranets, extranets, 
local area networks, wide area networks, etc. As shown, Web 
sites 10 are accessed by Web users 12 via the Internet. The 
Web users 12 may utilize the services provided by the 
inventive Internet search service system 20 at SSP level. The 
Internet search service system 20 includes a sorting and 
detecting means 22, search service means 24, which 
includes a server 28, and directories and database means 26. 

[0058] Each of the computers, be they client or server, 
generally include a processor, random access memory, data 
storage devices, data communications devices, a monitor, 
user input devices, etc. Those skilled in the art will recognize 
that any combination of the above components, or any 
number of different components, peripherals, and other 
devices, may be used with the client and server. 

[0059] For the purpose of indexing, the Internet search 
service system 20 is based on a new concept of the Internet. 
Currently, people use Web sites, Web pages, and Web 
documents to describe the information entities on the World 
Wide Web. Basically directories use Web sites as the search 
result, while search engines use Web pages. The proper way 
to index and retrieve information is to use a Web unit as the 
information entity. 

[0060] A Web site is the collection of total contents of a 
Web entity and has only first level URL, e.g. 

[0061] http://www.fastcompanV.com/ 

[0062] http://www.mlb.ilstu.edu/ 

[0063] http://movies.yahoo.com/ 

[0064] For some Web sites, which are hosted by a hosting 
Web site, the URLs look like 

[0065] http://maxpages.com/dbnursery 

[0066] A Web unit is a self-contained and distinguishable 
sub entity of the Web site. Its URL has several levels. 
Usually a navigation structure is built for a Web unit so that 



its users can go around the Web unit in a manageable and 
easy manner. Examples of Web units are: 

[0067] http://www.lib .berkeley.edu/TeachingLib/Guides/ 
Internet/ 

[0068] FindInfo.html 

[0069] http://hometown.aol.com/algen4me/page/in- 
dex.htm 

[0070] http://www.lakewoodconferences.com/wp 

[0071] http://www.ncsa.uiuc.edu/demoweb/ 

[0072] http://www.neci.neci.com/-lawrence/ 
science98.html 

[0073] Here a personal home page, the second example, is 
treated as a Web unit, because it is not big but needs to be 
indexed. A research paper, which may be a single Web page 
as the last example shows, is also treated as a Web unit. A 
Web site may have several Web units in it or can be treated 
also as a single Web unit if the Web site is not very large. 

[0074] A Web pave is an element of the Web unit. It has 
the maximum levels of URL within the Web site or Web unit. 
A Web unit (even a Web site) may only have one Web page, 
but generally a Web unit has several Web pages. Examples 
of Web pages are 

[0075] http ://www. lake woodconf erences .com/wp/intro * 
duction.html 

[0076] htt-p ://www.lib.berke ley.edu/TeachingLib/ 

[0077] The second example is called an intermediate Web 
page. An intermediate page is a bridge to more Web units. 

[0078] FIG. 2 shows a typical internal structure of a Web 
site 30. The Web site 30 has a home page 32. From the home 
page 32 the navigation structure brings people to Web units 
36 or an intermediate Web page 34. From the intermediate 
Web page 34 people can reach more Web units 36 at a lower 
level. Web units 36 can have one page or several pages. 
Within a Web unit 36 there may be more complicated 
structures. There are also some hyperlinks among Web units 
36 or between two Web pages in different Web units 36. 
FIG. 2 doesn't show the hyperlinks, because they are not 
critical for the indexing purpose. 

[0079] The invention uses a Web unit as the core element 
to catch the identity and functionality of a Web site. Only 
indexing the entire Web site will risk loosing too much 
information. Many Web sites can contain over hundreds or 
even thousands of Web pages. A brief description in the 
directory won't work well to tell what can be found in a Web 
site. Indexing each page is not necessary because within a 
Web unit all the pages are related and integrated into a 
whole. Therefore, the two-level Internet search service sys- 
tem indexes Web units at SSP level and directs Web users to 
matching Web units. Once Web users enter a Web unit they 
should be guided by the in -site searching means, which is 
described later, and easily find what they are looking for. 

[0080] The process of SSP-level search is like following: 
Web unit builders or Webmasters come to the SSP's Web site 
and register at the input Web unit. Then they download 
submission software from the SSP 28. The software contains 
a tutorial for submission, an index form, a multi-directory 
entry form, a suggestion form, and an update reminding 
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program. They fill in the index form, multi-directory selec- 
tion form, and suggestion form and send them back to the 
SSP's server. At the SSP's server, the input information is 
checked via SSP's editing standard, sorted, and stored. 
When Web users come to the SSP's Web site for an Internet 
search, they can follow directory structure or send a word 
query to SSP's search engine. SSP uses submitted informa- 
tion to help Web users narrow the search scope. Also SSP 
gives Web users the opportunities to fill feedback, post 
unsolved problems, rate the Web units they have visited, and 
report any error they find in their search process. While Web 
users are searching the Web units, SSP also provides per- 
sonal directory service and personal agent to improve the 
efficiency of search at in-site level. 

[0081] As shown in FIG. 3, the first component at SSP 
level search of the Internet search service system is self- 
submission. The key is a new input form, which is used to 
catch all the vital information about a Web unit. There can 
be as many as twenty information items needed to catch the 
identity of a Web unit. Conventional search engine and 
directory practice only ask for one URL and one e-mail 
address from each Web site. Some also ask for a brief 
description of the Web site. But current search engines and 
directories will never obtain the information gathered by this 
method. The following table shows some examples of 
information items needed for submission. Not all the items 
are necessary for every Web unit and also the list may be 
expanded to improve the accuracy. 

Submission Items Table 

[0082] 1. Title 

[0083] 2. URL of the Web unit 

[0084] 3. URL of the parent Web site 

[0085] 4. Brief description 

[0086] 5. Primary keywords and phrases 

[0087] 6. Secondary keywords and phrases 

[0088] 7. Total number of the Web pages in the Web unit 

[0089] 8. Entity name (company, organization, or per- 
son) and geographic location 

[0090] 9. Off-line companion for not-pure-Internet 
entity 

[0091] 10. Author's name (for academic papers and 
knowledge materials) 

[0092] 11. Product or category names and brands 

[0093] 12. Targeted users (geographic location and user 
group - age, gender, occupations, and etc.) 

[0094] 13. Membership information 14. Specifica- 
tions — size, text based, multimedia based, and etc. 

[0095] 15. Language information — Chinese, Dutch, 
English, French, German and others 

[0096] 16. Special requirements needed to view the 
Web unit 

[0097] 17. Geographic location of Web site 
[0098] 18. Update information 
[0099] 19. Contact information 
[0100] 20. Other special features 



[0101] Web unit builders or Webmasters will download a 
submission software means 38 from search service 24, 
which contains a guide and registration form for submission 
40 process. The submission program 38 contains a tutorial, 
a Web unit index form, a multi -directory entry form, a 
suggestion form, and an update reminding software. The 
tutorial will explain the web unit concept, index form, 
directory structure, and relationship between indices and 
directory entries. The index form contains blanks for Web 
unit builders or Webmaster at Web site 10 to fill in index 
items. The multi-directory entry form contains the directory 
selection structure for Web unit builders or Webmasters at 
Web site 10 to select categories, branches and nodes. They 
can select as many locations in the directory structure as they 
think it is suitable, as shown in FIG. 4. Both the index form 
and the multi-directory entry form will automatically gen- 
erate standard data format to be sent to SSP's server. The 
suggestion form is used for Web unit builders or Webmasters 
at Web site 10 to suggest new branches and nodes in the 
directory structure, because most likely the initial design of 
the directory structure may need redefining and reconstruct- 
ing to reflect the rich and complex Web contents. The 
updating reminding software will stay at Web site 10 and 
automatically detect new changes in the site's contents. 
Whenever there is a change, it will generate a reminding 
message to Web unit builders or the Webmaster to submit the 
new information. The basic features of comprehensiveness 
and recency can be achieved through this input system. Also 
the database at SSP's server will be more condensed and 
richer on information for covering the same amount of Web 
sites than conventional search engines'databases. 

[0102] The second component at SSP-level search of the 
Internet search service system is data organizer. The infor- 
mation or Web unit input 62 gathered from Web unit builders 
or Webmaster at Web site 10 will go through sorting and 
detecting means 22 and stored in database 26. The database 
26 can be further divided (by sorting and detecting 60) into 
a categorized directory 66 (seen in FIG. 4) to organize the 
Web units and an index database 64 to store all the input 
information. Human editors are also involved in the data 
organizing activities. 

[0103] The sorting and detecting program works as shown 
in FIG. 3. After the Internet search service system 20 
receives a submission, the sorting and detecting program 
will go through several checking steps (42, 44, 46, and 48) 
to find whether there is an error in the submission form 40, 
i.e. errors in the index form, misplacement in directory entry, 
or other errors that violate the SSP's editing policy. Any 
error will be recorded in 58. After all the checking is 
finished, 50 will check whether there is any error recorded 
at 58. If there is an error, a return form 56 will be sent to the 
submitting Web site to ask for a re-submit. Otherwise, a 
sorting program 52 will sort and send the data to a database 
54. One thing which can be controlled by the detecting is the 
ratio 44 of primary and secondary keywords and phrases. 
The rule -of- thumb for the ratio is less than V3, i.e. every one 
primary keyword should have at least three secondary 
keywords. For any submission which has more than fifteen 
keywords or phrases, the keywords or phrases must be 
divided into primary and secondary. If the ratio is not 
reasonable, a resubmission is required. A limit is placed for 
the length of the description 46. Another possible limit is the 
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number of product names 48. If number of product names is 
over a certain amount, the category name is required to 
replace the product names. 

[0104] There may be more checking items as the input 
form becomes more complex. Also the ratio of primary and 
secondary keywords and the limit on only submitting pri- 
mary keywords will be adjustable for accuracy. Web users 
will play a role in detecting errors in submitted information, 
as described later. Their contributions will be integrated into 
the detecting process. 

[0105] The directory structure is categorized into infor- 
mation, online shopping, online service, entertainment, busi- 
ness-to-business, and community, as shown in FIG. 4. As 
the Internet grows, there may be new categories added to the 
Internet search service system 20. Under each category, a 
complete directory forms a tree structure with branches and 
nodes. Therefore, there are as many directories as categories 
A Web unit may be placed in several directories. There will 
be some submissions that cannot clearly identify where the 
Web units belong. In this case the Web unit builder or 
Webmaster can suggest a new directory branch and node. 
The editors at the Internet search service system will use 
their expertise to place those Web units and modify the 
directories. In the index database 64, the same keyword or 
phrase may have different ranks, depending on whether the 
Web unit builders or Webmaster submitted them as primary 
or secondary. 

[0106] The third component of the Internet search service 
system 20 is the search service 24. The primary search 
methods are following directory structure and using key- 
word or phrase query. As the input system gets more and 
more information from Web unit builders or Webmasters, 
the search can also have other forms. The search process is 
an interaction between SSP and Web users. The search 
service 24 uses information collected to help Web users 
narrow down the result to a condensed and relevant Web 
units list. It provides a personalized directory system and a 
personal agent to Web users. It also provides a platform for 
self-improvement. 

[0107] The first function of the search service is the 
keyword or phrase search, which is shown in FIG. 5. When 
the information retrieved 70 exceeds a certain amount, the 
search service program 24 will pop out question 72 to 
narrow down the search scope and help make the found 
result more relevant to user's query. Here, most of the 
information gathered from the self -submission 40 is used as 
the variable items 76 to eliminate non-related Web units 
from the found list 70 and generate a new narrowed list 78. 
For example, for one search query there are 200 matching 
Web units, but the "target users" from all 200 Web units are 
different. 

[0108] This information can be used by a user to decide 
which kind of "target user" he/she is. Then all the Web units 
that match his/her choice of "target user" will form a new 
and narrowed list. This process is continued until the user 
feels comfortable to look at the list. Then they will enter 74, 
where they can also get help from a search agent described 
later. The features of relevancy and efficiency of the inven- 
tion are clearly shown. The found results will follow a rank 
to be shown to the user. The matching between search query 
and primary keyword/phrase will have a higher rank than 
secondary keyword/phrase does. User ratings, described 



below, will also be used to improve the ranking. Through the 
first function of the search service, the relevancy issue of 
Internet search can be improved dramatically. 

[0109] As shown in FIG. 6, the second function of the 
search service is to provide a personal directory 84 and 
search agent 88 to Web users 12. The personal directory 84 
will use multi-directory structure to record Web units and/or 
Web pages that have been visited by individual Web users. 
The personal directory can help the Internet search service 
system 20 set up a profile for individual Web users. The 
Internet search service system 20 will use this information to 
recommend Web units to Web users. It will also use updated 
information from the self-submission process to automati- 
cally update (by update and analysis software 90) the 
personal directories for Web users and inform them of the 
changes instantly. The update and analysis software 90 also 
analyzes search patterns of individuals and gives sugges- 
tions for search improvement and recommends Web units, as 
noted above. 

[0110] Web users can use the search agent 88 in two 
different ways. They can start from the found list 86 after the 
narrow down process or use personal directory without a 
new search. Personal agent will help Web users at the in-site 
level search. It can initiate in-site search engines at Web sites 
10 with a Web user's query. It can automatically go to other 
Web units in the visiting list, which forms from either found 
list 86 or in the same branch of the personal directory 84, to 
find related Web pages while the user is in the current Web 
unit. The search agent together with the first function will 
greatly increase the search efficiency. 

[0111] This relationship is indicated by the arrow between 
84 and 88. This arrow indicates means to allow communi- 
cation between the search agent and the personal directory 
so that the search agent can carry search requests to the Web 
units in the personal directory to initiate the in-site search 
engine tool kit for a more specific search. Optionally, the 
search agent can act as a Web robot to search a Web unit 
following the navigation structure built into the Web unit 

[0112] Another important function of the user's service is 
self-improvement. Web users 12 give feedback, rating 
scores, and requests to the system 20. Several aspects are 
included in the self-improvement process: 

[0113] Relevancy rating and detecting — Web users can 
rate the relevancy of the Web unit that they are visiting. They 
can also report any error or spam in the data collected about 
the Web unit. Their rating scores will be integrated into the 
index database 64 to improve the ranking accuracy and any 
error or spam detected will be corrected through the editorial 
procedure of the system; 

[0114] Supporting community — users with similar search 
interests can use this channel to exchange experiences and 
help each other on the search; 

[0115] User feedback — Web users can make suggestions 
for search service improvement. When the feedback is 
transmitted to the Web unit builders of Webmasters, it may 
include improvement suggestions from SSP experts; and, 

[0116] Unsolved query posting — any unsuccessful search 
will be recorded and posted to ask for help from the Internet 
community. A Web unit will be utilized for posting unsolved 
search problems related to Internet search and Internet use 
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and for providing solutions to the posted problems. Any 
unknown word will also be posted to ask for explanation and 
references. By doing that, the inventive Internet search 
service system 20 will gradually cover every word and 
phrase that people are searching for. This is another way to 
improve the comprehensiveness of the Internet search. By 
involving content providers and seekers, the Internet search 
service system can also solve the critical problem faced by 
current search engines and Internet directories, i.e. keeping 
up with the Internet's growing speed. 

[0117] At the in-site level search, Web users can follow a 
navigation system built inside of the Web unit or use an 
in-site search engine. The personal search agent can help 
them enhance search efficiency by searching other Web units 
on their visiting list while they are surfing in the current Web 
unit. 

[0118] As shown in FIG. 7, at the in-site level search 
service the Internet search service system 20 performs three 
tasks to improve the search at Web sites 10 for Web users 12. 
First, it uses Web users 1 feedback 104 to help Web unit 
builders or Webmasters at Web sites 10 to design a better 
navigation system by using design aid 106. Second, it 
implements metadata standards by letting Web unit builders 
or Webmasters use an authoring tool 108. Third, it installs an 
in-site search engine 110 to enhance the search capability for 
those Web sites or Web units that have used metadata. 

[0119] The first component of in-site search service is the 
design aid 106. Web users' feedback 104 is the major source 
for reconstruction of the Web site or unit, while other 
sources may also provide hints for improvement. The feed- 
back will be in two forms. The first one is the generic 
feedback form that can be used for any Web unit or Web site. 
The other one is the customized feedback form that meets 
special needs for certain Web sites or Web units. By col- 
lecting and analyzing the data, the Internet search service 
system 20 will generate a suggestion form for Web sites or 
Web units. Then Web unit builders or Webmasters will work 
with experts in the Internet search service system 20 to 
reconstruct the Web site or Web unit. Web unit builders or 
Webmasters can also choose using designing tools and 
testing tools provided by the Internet search service system 
20. The improved site or unit structure and navigation 
system can help Web users to surf around and find infor- 
mation more easily when they next visit the Web site or Web 
unit. 

[0120] The second component of in-site search service is 
the authoring tool 108. Web unit builders or Webmasters will 
download this tool. The tool employs software means to 
make the metadata writing easy. It will use simple forms to 
let users fill in the data and then generate standard data 
format for the in-site search engine 110 to use, 

[0121] The third component of in-site search service is the 
in-site search engine tool kit 110. This tool kit will integrate 
in-site robots, databases, and search engines for Web unit 
builders or Webmasters. The tool kit can be a generic one or 
a customized one, depending on the needs of Web unit 
builders and Webmasters. The in-site search engine has the 
capability to communicate with Web users' personal search 
agent. When Web users use the personal search agent to help 
them on the search, the agent will carry their query or 
commend and travel to the Web site or Web unit on the 
visiting list. When it gets there it will find the in-site search 



engine and convey the search request. The in-site search 
engine then will start the search and bring the result back to 
the Web user's browser. Web users can also initiate in-site 
search engine themselves. 

[0122] The in-site level search is very focused and effi- 
cient. It builds the search mechanism on Web users'needs 
and the unique purposes of the individual Web site or unit. 
The resources description and retrieve are localized. Navi- 
gating the Web unit is easy. The in-site search engine works 
fast on a relatively small database. Web users will have more 
satisfied search experiences from an improved and user- 
friendly in-site search. 

[0123] It is to be understood that the present invention is 
not limited to the sole embodiment described above, but 
encompasses any and all embodiments within the scope of 
the following claims. 

I claim: 

1. A system for conducting an Internet search service 
comprising: 

a personal directory for recording Web units or Web pages 
visited; 

a personal search agent for conducting a further search; 
an in-site search engine tool kit. 

2. The system of claim 1 further including software means 
which automatically updates said personal directory to 
reflect changes in Web units and in directory structure and 
for informing Web users of the changes immediately. 

3. The system of claim 1 further including software means 
to analyze individual's search pattern and give Web users 
search improvement suggestions and recommend new Web 
units/sites based on the analysis. 

4. The system of claim 1 further including a user feedback 
form for users to rate the Web units under certain quality 
criteria, to report any erroneous Web unit submission, and to 
suggest improvement on search quality. 

5. The system of claim 4 further including means to 
transmit said feedback forms to Web unit builders or Web- 
masters. 

6. The system of claim 1 further including a Web unit to 
post any unsolved search query for the Internet community 
to provide a solution. 

7. The system of claim 1 further including means to allow 
communication between said search agent and said personal 
directory wherein said search agent can carry search 
requests to the Web units io the said personal directory to 
initiate said in-site search engine tool kit for a more specific 
search or act as a Web robot to search a Web unit following 
the navigation structure built into the Web unit. 

8. The system of claim 1 further including a software 
means to let said in-site search engine tool kit perform 
communication with said personal search agent regarding 
Web users' search needs. 

9. The system of claim 1 further including an authoring 
tool for Web unit builders or Webmasters to create metadata 
for in-site search. 

10. A system for inputting data into an Internet search 
service system comprising: 

a search service provider server for guiding Web unit 
builders or Webmasters through an input process; 
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a downloadable software means containing forms for 
self-submission by a Web unit builder or Webmaster. 

11. The system of claim 10 wherein said software means 
includes: 

an electronic index form for Web unit builders or Web- 
masters to fill in the necessary items to describe the 
content in their Web units; 

an electronic multi-directory entry form for Web unit 
builders or Webmasters to locate the branches and 
nodes for their Web units; 

12. The system of claim 11 wherein said software means 
includes means for generating a standard data format from 
entries in said electronic index and said electronic multi- 
directory and transmitting the data back to said search 
service provider server. 

13. The system of claim 11 wherein said software means 
further includes an electronic form for letting Web unit 
builders or Webmasters suggest new branches and nodes in 
multi-directory structure. 

14. The system of claim 11 wherein said software includes 
a tutorial explaining the Web unit concept. 

15. The system of claim 11 wherein said software includes 
means for reminding Web unit builders or Webmasters to 
submit the new information immediately. 

16. The system of claim 11 wherein said server includes 
means for detecting any update or change in registered Web 
sites. 

17. A system for organizing data into a search service 
provider server comprising: 

a database to store Web unit indices and other search 
related information; 

a multi-directory data structure to host categorized Web 
unit entries according to a subject classification 
scheme; and 

a software means to process input information from Web 
unit builders or Webmasters. 

18. The system of claim 17 wherein said software means 
includes means for detecting errors in input from Web unit 
builders or Webmasters. 

19. The system of claim 17 further including an e-mail 
means for sending feedback to Web unit builders or Web- 
masters regarding errors in their input. 



20. The system of claim 17 further including a sorting 
means for automatically transforming input data to said 
database or said multi-directory structure. 

21. A method for facilitating an Internet search compris- 
ing: 

providing a personal directory for recording Web units or 
Web pages visited; 

providing a personal search agent for conducting a further 
search; 

providing an in-site search engine tool kit. 

22. The method of claim 21 further including providing 
means to automatically update the personal directory to 
reflect changes in Web units and in directory structure and 
for informing Web users of the changes immediately. 

23. The method of claim 21 further including providing a 
software means to analyze individual's search pattern and 
give Web users search improvement suggestions and rec- 
ommend new Web units/sites based on the analysis. 

24. The method of claim 21 further including providing a 
user feedback form for users to rate the Web units under 
certain quality criteria, to report any erroneous Web unit 
submission, and to suggest improvement on search quality. 

25. The method of claim 24 further providing including 
means to transmit the feedback forms to Web unit builders 
or Webmasters. 

26. The method of claim 21 further providing including a 
Web unit to post any unsolved search query for the Internet 
community to provide a solution. 

27. The method of claim 21 further providing a means to 
facilitate communication between the search agent and the 
personal directory, wherein the search agent can carry search 
requests to the Web units in the personal directory to initiate 
the in-site search engine tool kit for a more specific search 
or act as a Web robot to search a Web unit following the 
navigation structure built into the Web unit. 

28. The method of claim 21 further providing a software 
means to let the in-site search engine tool kit perform 
communication with the personal search agent regarding 
Web use rs'se arch needs. 

29. The method of claim 21 further providing an author- 
ing tool for Web unit builders or Webmasters to create 
metadata for in-site search. 

***** 
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